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Abstract 

Background: Doubly Uniparental Inheritance (DUI) is a fascinating exception to matrilinear inheritance of 
mitochondrial DNA (mtDNA). Species with DUI are characterized by two distinct mtDNAs that are inherited either 
through females (F-mtDNA) or through males (M-mtDNA). DUI sex-linked mitochondrial genomes share several 
unusual features, such as additional protein coding genes and unusual gene duplications/structures, which have 
been related to the functionality of DUI. Recently, new evidence for DUI was found in the mytilid bivalve 
Musculista senhousio. This paper describes the complete sex-linked mitochondrial genomes of this species. 

Results: Our analysis highlights that both M and F mtDNAs share roughly the same gene content and order, but 
with some remarkable differences. The Musculista sex-linked mtDNAs have differently organized putative control 
regions (CR), which include repeats and palindromic motifs, thought to provide sites for DNA-binding proteins 
involved in the transcriptional machinery. Moreover, in male mtDNA, two cox2 genes were found, one {hA-cox2b) 
123bp longer. 

Conclusions: The complete mtDNA genome characterization of DUI bivalves is the first step to unravel the 
complex genetic signals allowing Doubly Uniparental Inheritance, and the evolutionary implications of such an 
unusual transmission route in mitochondrial genome evolution in Bivalvia. The observed redundancy of the 
palindromic motifs in Musculista M-mtDNA may have a role on the process by which sperm mtDNA becomes 
dominant or exclusive of the male germline of DUI species. Moreover, the duplicated M-C0X2b gene may have a 
different, still unknown, function related to DUI, in accordance to what has been already proposed for other DUI 
species in which a similar cox2 extension has been hypothesized to be a tag for male mitochondria. 



Background 

Metazoan mitochondrial DNA (mtDNA) is generally a 
small molecule (15-20 kb), and although much larger 
mitochondrial genomes have occasionally been found, 
they are often products of duplications of mtDNA por- 
tions, rather than variations in gene content [1,2]. The 
typical mitochondrial gene complement encodes 13 pro- 
tein subunits of the oxidative phosphorylation enzymes, 
2 rRNAs and 22 tRNAs. However, the coding sequences 
(CDS) can be up to 16, the tRNAs up to 27 (source 
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MitoZoa: http://mi.caspur.it/mitozoa see [3]), and the 
rRNAs can be duplicated and/or fragmented in discon- 
tinuous genes, as in oysters [4]. Generally, there is also a 
single large non-coding region that is known to contain 
regulatory elements for replication and transcription (i.e. 
'Control Region', CR), but it is unclear whether it is 
homologous among distantly related animals or, alterna- 
tively, it independently arose from various non-coding 
sequences. This difficulty in establishing homology is 
because CRs share sequence similarity only among clo- 
sely related taxa. Finally, the mtDNA is almost always a 
circular molecule: only the cnidarian classes Cubozoa, 
Scyphozoa and Hydrozoa have been found to have lin- 
ear mtDNA chromosomes [5]. All metazoan 
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mitochondrial genes have homologs in plants, fungi 
and/or protists [6-9]. 

The Mollusca is the second largest animal Phylum and 
currently 99 complete mitochondrial genomes are avail- 
able in Genbank; among those, only 38 are from Bival- 
via, the second class in terms of species richness among 
mollusks. So far, bivalve mtDNA displays an extraordin- 
ary amount of variation in gene arrangement, i.e. very 
few shared gene boundaries are detectable, and gene 
translocations are common across all gene classes (pro- 
tein-coding genes, tRNAs and rRNAs). For this reason, 
bivalve mitochondrial genome may provide an excellent 
experimental system to review and test models of mt 
gene rearrangement evolution, which were mainly devel- 
oped in groups with stable genomes, such as vertebrates 
or arthropods. In addition, gene duplications and/or 
losses are present in almost every bivalve taxon in 
which a complete mitochondrial genome is available 
(see [10]). It is therefore evident that efforts should be 
made to improve the knowledge of bivalve mitochon- 
drial genomes. 

Another interesting feature of bivalve mtDNA is its 
unusual transmission route, which is found in some spe- 
cies: while in Metazoa mtDNA is known to be usually 
transmitted by Strict Maternal Inheritance (SMI; 
[11,12]), some bivalve mollusks show a deviation from 
this rule, named Doubly Uniparental Inheritance (DUI; 
[13,14]). DUI was found in species belonging to seven 
different bivalve families: Donacidae, Hyriidae, Margari- 
tiferidae, Mytilidae, Solenidae, Unionidae, and Veneridae 
([15,16]). Species with DUI are characterized by the pre- 
sence of two distinct gender-associated mtDNAs: one 
transmitted through eggs (F) and one transmitted 
through sperm (M). The F and M genomes show up to 
52% nucleotide divergence [17]. DUI seems at first to 
violate the universal rule of uniparental inheritance of 
organelles, because males receive their mtDNA from 
both parents and their tissues are heteroplasmic. How- 
ever the two mtDNAs segregate independently: the F- 
type is transmitted to the next generation only through 
females, while the M-type is only transmitted from 
father to sons, therefore both genomes are actually 
transmitted uniparentally. 

Because of its unique features, DUI should be a choice 
model to address many aspects of a wide range of biolo- 
gical sub-fields such as mitochondria inheritance, 
mtDNA evolution and recombination, genomic conflicts, 
evolution of sex and developmental biology (see [18] for 
a review). 

Recently, evidence for a new example of DUI was 
found in the mytilid Musculista senhousia [19]. In this 
work we characterized the two sex-linked mitochondrial 
genomes of M. senhousia, a step forward to the com- 
plete genetic characterization of DUI related sex-linked 



mitochondrial genomes. In fact, several unusual features 
are coming to light when analyzing mtDNAs in DUI 
systems, such as additional protein coding genes ([20], 
and references therein) and gene duplications/features 
[21,22]. Functional explanations for these features will 
require much additional work, but are needed to under- 
stand the evolution and maintenance of DUI. 

Results 

Mitochondrial genome features in M. senhousia 

The obtained M. senhousia mtDNAs are 21,557 bp long 
in female (F-type) and 20,612 bp in male (M-type) (see 
Tables 1 and 2). Sequences are available in GenBank 
(Acc. No. GU001953-GU001954). The size of both F 
and M mitochondrial genomes are within the size range 
of mollusk mtDNAs sequenced to date, i.e. from 7808 
bp in Batilaria cumingi to 32,115 bp in Placopecten 
magellanicus (source MitoZoa: http://mi.caspur.it/mito- 
zoa; [3]). 

M. senhousia F and M gene arrangements are remark- 
ably different from other fully sequenced metazoan 
mtDNAs (see [10] for a review). Genome annotations 
are reported in Figure 1 and 2, Table 1 and 2. When 
compared to other Mytilidae, only four gene boundaries 
are shared with Mytilus (tRNAs are not considered), i.e. 
rrnS-nad6, nad2-cox3, nad4L-nadS and nad3-coxl, 
while the rest of the genome is different, thus highlight- 
ing that gene arrangement evolves rapidly within the 
family. 

Comparing the two sex linked genomes, protein-cod- 
ing genes may have different lengths (Table 3). Both F- 
type and M-type include a large number of Unassigned 
Regions (URs; 29 in F and 27 in M: see Tables 1, 2 and 
Additional File 1). Among these, the largest (4,521 and 
2,844 bp in female and male respectively) are here 
referred as LURs (i.e. Large Unassigned Regions). 

Both F and M mt genomes show the same gene order 
and contain the full gene complement of the typical 
metazoan mtDNA, with two additional tRNAs: trnM 
and trnL (Figures 1 and 2; Tables 1 and 2). In males the 
cox2 gene is duplicated (Figure 2 and Table 2). 

The atp8 gene was reported as missing in several 
bivalve mollusks, however, as recently reported [23], the 
lack of atp8 would rather be an annotation inaccuracy 
due to the extreme variability of the gene. Following 
[23], we found an atp8 gene in M. senhousia in both M 
and F genomes. 

The position of the two ribosomal RNA genes, 
obtained through BLAST comparison, does not differ 
between male and female. In both sexes, rrnL is located 
in a region flanked by the trnM(AUG) and nad3 genes. 
Assuming that the first base at the 5'-end comes imme- 
diately after the trnM(AUG), and the 3'-end of the gene 
corresponds to the first base upstream of the start 
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Table 1 Organization of female Musculista senhousia mitochondrial genome. 



Type 




Starts 


Stops 


Length 


Strand 


Ml 1 UHJULM 1 Oldll LUUUII JUJJJ LUUUM 


bbNb 


nad3 


1 


390 


390 


U 

H 


Ayr TA A 

Alb 1 AA 


UR 


UR-1 
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tRNA 


trnY 
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691 


66 


H 


GTA 


UR 


UR-2 


692 


1 234 
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tRNA 


trnH 


1 235 


1 299 


65 


H 


b 1 G 


UR 


UR-3 


1 300 


1315 


1 6 






tRNA 


trnl 


1316 


1 381 


66 


H 


r at 
GA 1 


UR 


UR-4 


1382 


1 391 


1 0 






tRNA 


trnN 


1392 


1457 


66 


H 


GTT 


UR 


UR-5 


1458 


1 564 


1 07 






tRNA 


trnE 


1 565 


1 631 


67 


H 


1 1 b 


LUR 


LUR 


1632 


61 52 


4521 






GENE 


coxl 


61 53 


7736 


1 584 


H 


ATG TAA 


UR 


UR-6 


7737 


81 14 


378 






GENE 


cox2 


81 1 5 


8774 


660 


H 


ATA TA A 
A 1 A 1 AA 


UR 


UR-7 


8775 


8832 


58 






GENE 


atp8 


8833 
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1 35 


H 
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UR 


UR-8 


8968 


9051 


84 


H 




bbNb 


atp6 
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9765 


714 


u 
H 


Ajr TAT 

Alb 1 Ab 


UR 


UR-9 


9766 


9791 


26 






tRNA 


trnT 


9792 


9858 


67 


H 


Tt — r 
1 b 1 


GENE 


cob 


9835 


1 1 031 


1 1 97 


H 


ATA TA A 
A 1 A 1 AA 


UR 


UR-1 0 


1 1 032 


1 1 049 


1 8 






+ DK\ A 

tKNA 


trnD 


1 1 050 


11114 


65 


u 
H 


( — vr 
b 1 b 


UR 


UR-1 1 


11115 


1 1 1 23 


9 






tRNA 


trnR 


1 1 1 24 


1 1 1 89 


66 


H 


1 bb 


tRNA 


trnb(AbN) 


11191 


1 1 248 


58 


H 


Tt — r 
1 b 1 


UR 


UR-1 2 


1 1 249 


1 1 268 


20 






tKNA 


trnG 


1 1 269 


1 1 336 


68 


u 
H 


1 bb 


rRNA 


rrnS 


1 1 337 


1 21 54 


818 


H 




GENE 


nad6 


12155 


1 2778 


624 


H 


ATG TAA 


UR 


UR-1 3 


1 2779 


1 2828 


50 






GENE 


nao!2 


1 2829 


1 3773 


945 


H 


ATA TA A 
A 1 A 1 AA 


1 1 D 

UK 
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1 3774 


1 3855 


82 
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1 3856 


14710 
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Ajr TAA 
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UR 
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1471 1 


14721 
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tRNA 


trnK 


14722 


14792 


71 


H 


1 1 1 


UR 


UR-1 6 


14793 


14797 


5 






tDM A 

tKNA 


trnF 


14798 


1 4865 


DO 


u 
n 


fAA 

bAA 


UR 


UR-1 7 


14866 


14878 


1 3 






tRNA 


trnP 


14879 


14945 


67 


H 


TGG 


UR 


UR-1 8 


14946 


14977 


32 






tRNA 


trnL(CUN) 


14978 


15042 


65 


H 


TAG 


UR 


UR-1 9 


15043 


15047 


5 






tRNA 


trnC 


15048 


15114 


67 


H 


GCA 


UR 


UR-20 


15115 


15159 


45 






tRNA 


trnL(UUR) 


15160 


15223 


64 


H 


TAA 


UR 


UR-21 


15224 


15259 


36 






GENE 


nad1 


15260 


16252 


993 


H 


ATG TAA 


UR 


UR-22 


16253 


16385 


133 






tRNA 


trnM(AUA) 


16386 


16448 


63 


H 


TAT 
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Table 1 Organization of female Musculista senhousia mitochondrial genome. (Continued) 
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UR 
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18843 
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GENE 


nad4 


18844 


20163 
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H 




ATA 


TAG 


UR 


UR-27 


20164 


20213 


50 










tRNA 


trnW 


20214 


20280 


67 


H 


TCA 






UR 


UR-28 


20281 


20285 


5 










tRNA 


trnQ 


20286 


20353 


68 


H 


JTG 






UR 


UR-29 


20354 


20360 


7 










tRNA 


trnM(AUG) 


20361 


20427 


67 


H 


CAT 






rRNA 


rrnL 


20428 


21557 


1130 


H 









codon of nad3 gene, the length of the rrnL genes are 
remarkably different: the male rrnL (1,682 bp in length) 
is 552 bp longer than the female one (1,130 bp in 
length). The rrnS gene is located in a region flanked by 
trnS and nad6 genes and, as above, we assumed that the 
first base at the 5'-end comes immediately after trnG, 
and that the 3'-end of the gene corresponds to the first 
base upstream of the start codon of nad6 gene. Here, 
the difference in length is reduced to 82 bp: the female 
rrnS gene is 819 bp long while the male one is 1,087 bp. 

F and M genomes of M. senhousia contain 22 tRNA 
genes (see Tables 1, 2 and Additional File 2). As 
observed in mtDNA of some other mollusks {Katharina 
tunicata, Cepaea nemoralis, Mytilus species complex 
and Argopecten irradians), two leucine tRNA genes are 
present in M. senhousia. These can be differentiated by 
their anticodons: TAA for trnL(UUR) and TAG for trnL 
(CUN), which are 2-fold and 4-fold redundant respec- 
tively. Consequently, tnrL is 6-fold redundant. An addi- 
tional trnM was also detected, as in V. philippinarum, 
Mytilus species complex, Crassostrea gigas, C. hongkon- 
gensis and C. virginica. The additional tRNA coding for 
methionine, trnM(AUA), has the TAT anticodon. 

In both male and female mtDNAs, trnS(AGN) have a 
shortened DHU (See Additional File 2) that is not atypi- 
cal, as this arm is unpaired in many metazoan taxa 
[24-27]. Moreover, mispairing between bases in stems is 
consistent across several taxa. For example, the second 
base pair in the anticodon stem of trnW\\2& a T-T mis- 
pairing in Lampsilis ornata, Mytilus, and K. tunicata 
and a T-G pairing in several gastropods [25]. 

In the F mitochondrial genome of Musculista, 20 out 
of 22 tRNA genes are clustered in five groups of two to 
six (see Figure 1 and Table 1). Of the remaining two, 
trnT lies between atp6 and the 5'-end of cob genes 



(with 24 bp overlapping each other) while trnA lies 
between nadS and nad4 genes. Thus, 4 of the 13 pro- 
tein-coding genes {cob, nadl, nad4L and nad4) have a 
tRNA preceding their 5'-end. In contrast, 7 other genes 
(coxl, cox2, atp8, atp6, nad2, cox3 and nadS) have a 
non-coding sequence at their 5'-end that is capable of 
forming a stem and loop structure (see Figure 3). 

In male mitochondrial DNA, 19 of the 22 tRNA genes 
are clustered in five groups ranging from two to six (see 
Figure 2 and Table 2). Of the remaining three, trnT lies 
between atp6 and the 5'-end of cob genes (with 25 bp 
overlapping each other), trnA lies between nadS and 
nad4 genes and trnE lies between the large unassigned 
region (LUR) and the 5'-end of coxl gene. Thus, 5 of 
the 14 protein-coding genes {coxl, cob, nadl, nad4L 
and nad4) have a tRNA preceding their 5'-end, while 7 
other genes {cox2b, cox2, atp8, atp6, nad2, cox3 and 
nadS) have a non-coding sequence preceding their 5'- 
end that is capable of forming a stem and loop structure 
(see Figure 3). In a few cases those structures contain 
the translation initiation codon {coxl and cox2 in 
females, nad2 in males). 

The nucleotide compositions of the two genomes are 
summarized in Table 3. Given the G content of the F 
and M coding strand (see Table 3), this can be consid- 
ered as the heavy (H) strand of the molecule. The A+T 
content of the H strand is also high (66.5%, F; 67.0%, 
M). Variable values of A+T content are common in 
mollusks, and they have been reported in L. ornata 
(62%, [28]), Pupa strigosa (61.1%, [29]), and C. nemoralis 
(59.8%, [25]). In other mollusks, the A+T content is 
much higher (Albinaria coerulea, 70.7%, [30]; K. tuni- 
cata, 69.0%, [6]; Graptame eborea, 74.1%, [31]). Muscu- 
lista values in A+T content are among the highest 
observed in the Phylum, and reflect the high 
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Table 2 Organization of male Musculista senhousia mitochondrial genome. 



Type 




Starts 


Stops 


Length 


Strand 
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69 
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UK 
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tRNA 
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65 
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1 0 






tRNA 
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H 


1 Lb 
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4 






tRNA 


trnC 


13626 


13696 


71 


H 


GCA 


UR 


UR-1 9 


13697 


13737 


41 
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Table 2 Organization of male Musculista senhousia mitochondrial genome. (Continued) 
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heterogeneity of molluscan mtDNA [2]. Moreover, there 
is a marked bias in favor of T against C, which is not 
restricted to any particular class of genes and does not 
differ between the two genomes. 

The GC and AT asymmetry between the two mito- 
chondrial DNA strands can be expressed in terms of 
GC skew and AT skew calculated according to [32]: GC 
skew = (G-C)/(G + C) and AT skew = (A-T)/(A+T), 
where G, C, A, and T are the occurrences of the four 
bases in the H strand. In M. senhousia F and M mito- 
chondrial genomes, the GC skew and the AT skew are 
F: +0.28 and -0.18, and M: +0.23 and -0.17, respectively. 

In the M. senhousia male mtDNA 6 out of 14 protein 
genes start with the ATA codon and 8 with ATG, while 
in the female 7 out of 13 start with ATG and 6 with 
ATA (Tables 1 and 2). This pattern differs from that 
observed for Mytilus galloprovincialis, where 9 out of 13 
protein genes start with the ATG codon, 2 with the 
ATA and 2 with GTG [23,33]. In all known metazoan 
mtDNAs, the most common start codon is ATG, and it 
is a general opinion that the methionine tRNA with the 
CAT anticodon represents the ancestral form. Moreover 
[24] suggested that the second methionine tRNA arose 
by duplication. The F and M genomes of the venerid 
Venerupis philippinarum also have two tRNA genes for 
methionine, but both have the ancestral CAT anticodon. 
TAA is the termination codon ten times in F and nine 
times in M mtDNA, while TAG is a stop codon two 
times in F, and four times in M. In both M and F gen- 
omes, nadS gene is terminated by an incomplete termi- 
nation codon T- (Tables 1 and 2), with their likely 
completion occurring by polyadenylation after transcript 
processing [34]. 

A total of 4,098 and 3,794 amino acids residues are 
encoded by male and female M. senhousia mitochon- 
drial genome respectively (Table 4). All codons do occur 



in both Musculista mitochondrial genomes (Table 5). 
UUU (phenylalanine) is the most frequent codon, fol- 
lowed by UUA (leucine). UUU is also the most frequent 
codon in M. galloprovincialis [33], in L. ornata [28] and 
in C. nemoralis [35], whereas UUA (leucine) is most 
common in A. coerulea [30], P. strigosa [29], Roboastra 
europaea [36], G. eborea [31], and K. tunicata [6]. These 
two codons are also the most frequently used in other 
invertebrate mtDNAs [37-42]. UUU is also very frequent 
in basal chordates (e.g. amphioxus, Branchiostoma lan- 
ceolatum, [43]), but not in most vertebrates, where CUA 
(e.g., Cyprinus, [44]; Homo sapiens, [45]) or AUU (e.g., 
Xenopus laevis, [46]; Danio rerio, [47]) are the most 
frequent. 

The least used codons in males are UCG (6), CCG (8) 
and CGG (8), while in females they are CCG (4), CGC 
(7) and UAG (7). Of these, CGC is also among the least 
common in the mtDNA of other mollusks. Synonymous 
codons, whether four-fold (4FD) or two-fold (2FD) 
degenerate, are recognized by the same tRNA, with the 
exception of the methionine codons, which are recog- 
nized by different tRNAs (Table 5). 

Moreover, 2,754 F and 2,967 M Musculista codons 
(72.6% and 72.4% in female and in male respectively) 
end with an A or T, a more pronounced phenomenon 
than what observed for a typical invertebrate codon 
bias. There is a strong bias against the use of C (9.3% 
and 11.3% in female and in male respectively) at the 
third position nucleotide in all codons: in detail, for 
residues with a fourfold degenerate third position, 
codon families ending with T are the most frequently 
used (46.7% and 46.6% in female and male respec- 
tively). This is also the case for two-fold degenerate 
codons. In other words, in every case an amino acid 
residue can be specified by any NNY codon, both 
female and male M. senhousia mitochondrial genomes 
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have a much higher proportion of NNT:NNC. In fact, 
female showed 44.7% of T and 9.3% of C, with NNT: 
NNC ratio of 4.8:1; while in male the ratio's value is 
slightly lower: 3.9:1 (43.8% of T and 11.2% of C). At 
the second position, there is even a stronger bias in 
favor of the use of T usage (45.4% and 44.2% in female 
and male respectively) (see Table 6), like in M. edulis 
(43.5%), C. hongkongensis (42.5%), C. gigas (42.3%) and 
C. virginica (43.0%). 



Finally, in eight 2FD and seven 4FD codon families in 
females and in seven 2FD and seven 4FD codon families in 
males, the most frequently used codon does not match the 
tRNA anticodon. This has been observed in other 
metazoan mtDNA as well [46-50] and it suggests that strict 
codon-anticodon complementarity does not affect the 
codon composition of the genome. Deviations from equal 
frequency of the four nucleotides in 4FD sites are common 
in the animal mtDNA and have been attributed to several 
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Figure 2 Male Musculista senhousia mitochondrial genome. Gene map of the male Musculista senhousia mitochondrial genome. Shortest 
URs (< 100 bp) are not indicated. 



factors, such as unequal presence of the four nucleotides in 
the nucleotide pool, preference of the mitochondrial 
gamma DNA polymerase for specific nucleotides, or asym- 
metrical mutation rate owing to different duration of expo- 
sure of the lagging strand during replication [40,51-54]. 

Comparing the two M. senhousia sex linked genomes, 
the most conserved protein-coding genes are coxl and 
cob, and the least conserved are nad6 and atp8 (Table 4). 
Synonymous (Ks) and non-synonymous (Ka) substitution 
values between the two genomes do vary (Table 4). Ka is 



particularly low for coxl (0.042), whereas Ks is not 
(0.838), suggesting that this gene is under some selective 
constraint (Ka/Ks = 0.05). The conservation of coxl is 
common in animal mtDNA [55,56]. In cob gene, both K 
values are lower than average (Table 4) with a Ka/Ks 
ratio's value (0.10) which is close to that of coxl gene. 

The Large Unassigned Region (LUR) 

As mentioned, in the female genome the LUR (F-LUR) 
is 4,521 bp long and it is included between trnE and the 
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Table 3 Length, base composition and sequence 
divergence of M, F genes and URs in Musculista 
senhousia. 



Gene/Region F/M Length Base Composition (% 

T, C, A, G) 



pD ± SE 



UR1-27/LUR 


M 


4296 


37.8 


11.2 


31.4 


19.5 


NA 


UR1-29/LUR 


F 


6798 


37.9 


10.4 


30.8 


20.9 


NA 


rrnL 


M 


1682 


37.3 


12.6 


30.8 


19.3 


0.343 ±0.015 




F 


1130 


35.8 


13.5 


30.4 


20.3 




rrnS 


M 


900 


36.0 


11.6 


33.1 


19.3 


0.093 ± 0.009 




F 


818 


37.2 


11.0 


32.2 


19.7 




all rRNA genes 


M 


2582 


36.3 


12.2 


31.6 


19.3 


0.209 ± 0.010 




F 


1948 


36.4 


12.4 


31.2 


20.0 




otp6 


M 


714 


43.8 


12.7 


23.5 


19.9 


0.258 ± 0.016 




F 


714 


42.2 


12.9 


23.8 


21.1 




atp8 


M 


192 


42.2 


14.1 


27.6 


16.1 


0.281 ± 0.037 




F 


135 


43.0 


12.6 


25.9 


18.5 




coxl 


M 


1584 


38.3 


15.9 


24.7 


21.1 


0.180 ± 0.009 




F 


1584 


40.0 


14.4 


24.4 


21.3 




cox2 


M 


690 


36.7 


15.2 


26.7 


21.4 


0.264 ±0.016 




F 


660 


37.4 


14.5 


27.3 


20.8 




cox2b 


M 


813 


35.9 


14.1 


28.7 


21.3 


0.267 ± 0.016* 




F 


NA 




NA 






cox3 


M 


855 


42.0 


13.1 


23.3 


21.6 


0.220 ± 0.012 




F 


855 


43.4 


12.9 


20.9 


22.8 




cob 


M 


1197 


40.6 


13.9 


25.2 


20.3 


0.106 ± 0.009 




F 


1197 


40.4 


13.6 


24.9 


21.1 




nodi 


M 


996 


39.8 


12.2 


26.0 


22.0 


0.227 ± 0.014 




F 


993 


41.3 


11.5 


24.4 


23.2 




nad2 


M 


945 


44.9 


10.8 


24.4 


19.9 


0.302 ± 0.013 




F 


945 


44.1 


10.9 


22.4 


22.5 




nad3 


M 


375 


44.3 


14.1 


21.3 


20.3 


0.267 ± 0.021 




F 


390 


45.6 


12.6 


21.0 


20.8 




nad4 


M 


1329 


41.4 


11.5 


23.6 


23.5 


0.273 ± 0.013 




F 


1320 


39.9 


11.9 


24.3 


23.9 




nad4L 


M 


216 


43.5 


8.8 


24.5 


23.1 


0.199 ± 0.027 




F 


216 


44.0 


8.8 


24.5 


22.7 




nad5 


M 


1765 


39.5 


13.2 


27.9 


19.4 


0.285 ± 0.01 1 




F 


1750 


38.7 


13.3 


25.7 


22.3 




nad6 


M 


624 


43.8 


11.4 


25.6 


19.2 


0.217 ± 0.017 




F 


624 


42.1 


12.3 


25.2 


20.4 




all proteins 


M 


12295 


40.6 


13.2 


25.4 


20.9 


0.231 ± 0.004# 




F 


11383 


40.9 


12.8 


24.1 


22.1 




complete 


M 


20612 


39.3 


12.7 


27.7 


20.3 


NA 




F 


21557 


39.3 


12.0 


27.2 


21.4 





UR = Unassigned Regions. 
NA = Not Available. 
pD = p-Distance. 
SE = Standard Error. 

*: pD between Mcox2 and Mcox2b genes. 

#: Mcox2b gene was excluded from the computation of overall pD. 



5'-end of coxl gene (Figure 1 and 4, Table 1), while in 
the male it (M-LUR) is 2,844 bp long, and included 
between trnN and trnE genes (Figure 2 and 4, Table 2). 
Both start with a dissimilar sequence/spacer 20 and 237 
bp long, respectively. 

The F-LUR contains two large repeats (Figure 4: Repl 
and Rep2) about 2,150 bp long (2,149 Repl; 2,151 
Rep2), both subdividable in three regions: A, B and C 
(named A lf A 2 , B lf B 2 , Ci and C 2 ; see Figure 4 and 
Additional File 3). Between Repl and Rep2, the A sub re- 
gion is the most conserved (pD = 0.000, see Table 6) 
while C is the most variable, although with a low pD 
(0.010 ± 0.005). Overall, Repl and Rep2 have a pD of 
0.004 ± 0.001. The region including the last 202 bp of 
the F-LUR shows some similarity (pD = 0.449 ± 0.035) 
to the A subregions (A 1 and A 2 ), for this reason it is 
indicated here as subregion A\ 

All the A-type subregions (A 1? A 2 and A') start with a 
46 bp conserved motif, named here a, that contains a 10 
bp hairpin (ah; see Figure 5). Both the subunits C (Ci 
and C 2 ) begin with a hairpin 27 bp long (Ch; Figure 5). 
The M-LUR contains an A-like subregion showing a pD 
of 0.362 ± 0.032 from A x and A 2 (Table 6), indicated as 
A" (Figure 4). A" starts with a 37 bp motif, here named 
a*, similar to a, but 9 bp shorter and with three muta- 
tions that allow the formation of a longer hairpin, here 
named a*h (31 bp; Figure 5), in comparison to the female 
hairpin ah. The M-LUR continues with the subunit B 
that is the most conserved region compared to the F- 
LUR showing a pD from B x and B 2 of 0.098 ± 0.007 and 
0.096 ± 0.007 respectively (Table 6). At the 3' end of B 
there is a motif, indicated as y (Figure 4) that is similar to 
the first part of the subunits C. y is repeated four times in 
tandem. The length of y lf y 2 and y 3 ranges from 268 and 
265 bp while the last repeat, y 4 , is truncated and mea- 
sures 17 bp (Additional File 3; Figure 4). The pD among 
the y motifs is low and ranges from 0.008 ± 0.005 in the 
female (between y cl and y c2 ) and 0.019 ± 0.009 between 
Yi and y 3 (Table 6). The pD of the y motifs between male 
and female varies from 0.346 and 0.350 ± 0.027 (Table 6). 
At the 5' end of each y motif a secondary structure is pre- 
sent (yih, y 2 h, y 3 h and y 4 h respectively; Figure 5): y^ is 14 
bp long, while the other three are 28 bp long. y 2 h and y 3 h 
are identical, y 4 h has a two bases mutation at the center 
of the loop and y^ is identical to the upper portion of 
y 4 h (see Figure 5). 

Furthermore, in line with what has been found in 
other DUI bivalves, including Mytilus, an ORF coding 
for 121 amminoacids has been found in the F-LUR of 
M. senhousia. This protein was proposed to have a 
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1 52 

coxl (dG=-1 .70) cox2 (dG=-1 1 .20) 




cox2b (dG=-3.30) 



1 37 

atp8 (dG=-7.60) 




atp6 (dG=-9.20) 





nad2 (dG=-2.90) cox3 (dG=-1 .40) 





nad5 (dG=-3.10) 




cox2 (dG=-2.30) atp8 (dG=-0.70) atp6 (dG=-3.60) nad2 (dG=-7.80) cox3 (dG=-1. 10) nad5 (dG=-1 .30) 



M 



Figure 3 Intergenic palindromes. Putative secondary structures preceding the 5'-end of some protein-coding genes. (F) Female Musculista 
senhousia mitochondrial genome. (M) Male Musculista senhousia mitochondrial genome. 



functional role in DUL Detailed analyses on this novel 
DUI related putative protein have been published in a 
more comparative way (see [20]). 

The cox2 duplication in the male mtDNA 

The male mtDNA contains an extra copy of the cox2 
gene. This is not new for DUI animals, since the female 
mt genome of the marine clam V. philippinarum has a 
cox2 duplication as well (GenBank Acc. No. AB065375: 
Okazaki and Ueshima, unpublished). 

In the female Musculista, the cox2 gene (Fcox2) is 660 
bp long and is flanked by the "coxl/UR-6" and "UR-7/ 
atp8" regions at the 5'- and 3'-end respectively (see 



Figure 1 and Table 1). In male mitochondrial genome, 
the two copies of cox2 are close to each other and 
linked by a little non coding region 41 bp long (UR-6). 
The two cox2 copies are located between "coxl /UR-5" 
and "UR-7 1 atp8" regions, and the first is 813 bp long, 
while the second is 690 bp long (Figure 2 and Table 2). 

Bayesian phylogenetic analyses on Fcox2, Mcox2(690 
bp), Mcox2(813 bp) genes and their homologous in 
Mytilus species, demonstrated that Fcox2 is more closely 
related to the shorter Mcox2 (690 bp), rather than to the 
longer one (Figure 6). For this reason, the 813 bp long 
Mcox2 seems to be an extra copy of the gene, and thus 
it is referred here as Mcox2b. 
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Table 4 Genes, gene lengths and divergences in male 
and female Musculista senhousia protein coding genes. 



Protein gene 




paa 


pD 


+ 


SE 


Ks 


Ka 


Ka/Ks 


otp6 


238 


238 


0.228 


+ 


0.026 


0.894 


0.1 56 


0.1 7 


Qtp8 


64 


45 


0.302 


+ 


0.070 


0.581 


0.233 


0.40 


coxl 


528 


528 


0.053 


+ 


0.009 


0.838 


0.042 


0.05 




230 


220 


0.251 


+ 


0.027 


0.877 


0.1 78 


0.20 


cox2b* 


271 


NA 


0.279 


+ 


0.029* 


0.653* 


0.223* 


0.35* 




285 


285 


0 1 55 




0 022 


0 81 1 


0 1 07 


0 1 3 


cob 


399 


399 


0.058 


± 


0.012 


0.346 


0.034 


0.10 


nadl 


332 


331 


0.218 


+ 


0.022 


0.670 


0.145 


0.22 


nad2 


315 


315 


0.306 


± 


0.026 


0.843 


0.244 


0.29 


nad3 


125 


130 


0.218 


± 


0.034 


0.964 


0.162 


0.17 


nad4 


443 


440 


0.243 


± 


0.020 


0.931 


0.175 


0.19 


nod4L 


72 


72 


0.183 


± 


0.045 


0.626 


0.107 


0.17 


nad5 


588 


583 


0.274 


± 


0.018 


0.862 


0.208 


0.24 


nod6 


208 


208 


0.324 


+ 


0.031 


0.619 


0.268 


0.43 


all proteins 


4,098 


3,794 








0.716 


0.143 


0.20 



M aa and F aa = number of amino acids in male and female respectively. 
pD = p-Distances at the amino acidic level. 

Ks and Ka = divergence of protein genes in synonymous (Ks) and non 
synonymous (Ka) sites respectively. 
SE = Standard Error. 

Ka/Ks = ratio values between Ka and Ks. 

*: comparisons between Mcox2 and Mcox2b genes. 

Discussion 

Gene content and order of F and M Mitochondrial 
genomes in M. senhousia 

In M. senhousia both M and F mtDNAs share the same 
gene content and order, except for a duplicated cox2 
gene in males, and include the typical gene content of 
bivalve mtDNA. It has to be noted, however, that a 
common feature of bivalves is the apparent lack of the 
atp8 gene. For instance, [2] mentioned that a lack of the 
atp8 gene is one of several unusual features of the Myti- 
lus mt sequence. The atp8 gene was considered missing 
for almost all bivalve species studied so far, including 
Crassostrea hongkongensis, C. gigas, C. virginica, Placo- 
pecten magellanicus, Argopecten irradians, Mizuhopecten 
yessoensis and Acanthocardia tuberculata. On the con- 
trary, the apt8 gene was found in Hiatella arctica, as 
well as in the female mitochondrial genome of the 
unionid bivalve L. ornata [28]. A remarkable observation 
is that V. philippinarum, another species with DUI [57], 
was recently found to contain a putative atp8 gene [58], 
which was not found in the first analyses; nonetheless, 
this gene apparently encodes 37 amino acids only and 
therefore has a questionable gene function. Finally, [23] 
examined ORFs from several bivalve mitochondrial gen- 
omes and found two novel ORFs (T-orf-ur4 and M-orf- 
ur4) in the largest unassigned region of F and M mytilid 
ones (UR-4: see [33]). BLASTN searches against EST_- 
others (all ESTs except human and mouse) showed that 



both are transcribed in Mytilus spp. BLASTX and PSI- 
BLAST searches using inferred aminoacid sequences of 
¥-orf-ur4 and M-orf-ur4 failed to detect any significant 
sequence similarity with known proteins, so the identity 
of those putative proteins is still unclear. Further ana- 
lyses on structure and evolution patterns suggested that 
the novel ORFs "represent good candidates for the pre- 
viously 'missing' atp8 in mytilid mtDNAs" [23]. There- 
fore, following [23], we also found atp8 putative genes 
in both sex-linked mitochondrial genomes of M. senhou- 
sia. Our atp8 genes share the same characteristics of the 
above mentioned proteins, so we are confident to anno- 
tate them as Musculista atp8 genes. 

Generally speaking, most mtDNAs are characterized 
by strand asymmetry in term of gene distribution. In 
both M. senhousia mt genomes, all genes are transcribed 
from the same strand, i.e. the asymmetry is at its highest 
among Metazoa. Most marine bivalves also share this 
feature (Mytilus species-complex, C. gigas, C. virginica, 
C. hongkongensis and V. philippinarum). In contrast, 
this is not true for the two freshwater species L. ornata 
[28] and Inversidens japanensis (Acc. No. AB055625 and 
AB055624) (see also [59]). In other mollusks, a relatively 
small number of mitochondrial genes are transcribed 
from the second strand. The scaphopods G. eborea and 
S. lobatum are an exception, with about an equal num- 
ber of genes encoded by each strand [31,58]. The occur- 
rence of all genes in the same strand is a relatively rare 
phenomenon in metazoans and, in addition to bivalves, 
it has been reported in some annelids (Lumbricus terres- 
tris, [60]; Platynereis dumerilii, [61]) and brachiopods 
(Terebratulina retusa, [62]; Terebratalia transversa, [42]; 
Laqueus rubellus, [63]). Actually, almost 10% of the 
mitochondrial genomes examined to date do have all 
genes encoded in the same strand [10]. Moreover, most 
of the above mentioned groups, including Bivalvia, are 
also characterized by strong differences in gene content 
and/or gene order. This allowed [10] to suggest a possi- 
ble correlation between these two features. 

The trnS(AGN) could not be located with tRNAscan- 
SE [64] because of the absence of the DHU arm and 
therefore of a normal cloverleaf structure (see [27] for a 
detailed discussion), so we used the ARWEN software 
[65] to identify it. This unconventional tRNA was found 
also in several other animal groups ([27] and references 
therein), and it evolved very early in Metazoa [66]. In 
vitro analyses confirmed its functionality [67]. 

In Table 7, the distribution of trnS(UCN) and trnS 
(AGN) among bivalves is reported (only complete mito- 
chondrial genomes included; source: http://mi.caspur.it/ 
mitozoa see [3]). Most of the species (22) have both the 
tRNAs, 7 only trnS(UCN) and 3 (including M. senhou- 
sia) only trnS(AGN). Placopecten magellanicus have two 
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Table 5 Codon usage in male and female Musculista senhousia mitochondrial genomes. 
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32 
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CGA 


14 
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CNA 
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3,1 




CUG 


41 
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CCG 


4 


0,1 




CAG 


26 


0,7 




CGG 


13 


0,3 


CNG 


84 


2,2 


He (1) 


AUU 


147 


3,9 


Thr CO 


ACU 


54 


1,4 


Asn (N) 


AAU 


82 


2,2 


Ser (S) 


AGU 


71 


1,9 


ANU 


354 


9,3 




AUC 


41 


1,1 




ACC 


9 


0,2 




AAC 


27 
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AGC 


30 


0,8 


ANC 
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2,8 


Met (M) 
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ACA 


29 
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2,1 
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90 
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ANA 
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8,9 




AUG 


62 


1,6 
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17 
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33 
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AGG 
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Ala (A) 


GCU 


88 


2,3 


Asp (D) 


GAU 


59 


1,6 


Gly (G) 


GGU 


102 


2,7 


GNU 


449 


11,8 




GUC 


24 


0,6 




GCC 


17 


0,4 




GAC 


15 


0,4 




GGC 


39 


1,0 


GNC 


95 


2,5 




GUA 


113 


3,0 




GCA 


44 


1,2 


Glu (E) 


GAA 


44 


1,2 




GGA 


43 


1,1 


GNA 


244 


6,4 




GUG 


84 


2,2 




GCG 


22 


0,6 




GAG 


49 


1,3 




GGG 


89 


2,3 


GNG 


244 


6,4 
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NCN 
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Total 


3792 
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36 
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69 
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2,5 
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UAG 
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91 
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14 
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0,7 
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71 
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55 
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20 
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12 
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CNA 
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3,1 
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28 


0,7 
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8 


0,2 
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22 


0,5 
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0,2 
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66 


1,6 


He (1) 


AUU 
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4,3 
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ACU 


61 


1,5 


Asn (N) 
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81 


2,0 


Ser (S) 
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78 


1,9 
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9,7 
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43 


1,0 
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22 


0,5 
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52 


1,3 




AGC 


43 


1,0 
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160 


3,9 


Met (M) 


AUA 


148 


3,6 




ACA 


35 


0,9 


Lys (K) 


AAA 
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2,5 




AGA 


97 


2,4 


ANA 


384 


9,4 




AUG 


79 


1,9 
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12 


0,3 
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2,0 


Asp (D) 
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4098 





Codons that match the corresponding tRNA anticodon are bold and underlined, 
aa: coded amminoacid. 
s.c: stop codon. 



copies of trnS(UCN), while Mizuhopecten yessoensis 
seems to lack a Serine tRNA. [68] suggested that the 
secondary structure of a tRNA gene between a pair of 
protein genes is responsible for the precise cleavage of 
the polycistronic primary transcript. In the absence of a 
tRNA, this role can be played by a stem-loop structure, 
the 5'-end part of the gene itself, or a combination of 
the two. Potential hairpin structures at protein-protein 
gene junctions with no intervening tRNA have been 
reported in several studies (e.g., [6,33,39,69,70]). Our 



analysis demonstrated that putative hairpins are present 
in all the gene junctions in which a tRNA lacks, suggest- 
ing a functional role of such intergenic sequences (Fig- 
ure 3). 

The Large Unassigned Region (LUR) and the sex-linked 
mt-DNA transmission 

The structure of the F and M LUR palindromes found 
are reported on Figure 4 and 5. The presence of palin- 
dromes within a mtDNA CR is not new; in fact, the 
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Table 6 p-Distance (± Standard Error) of LURs repeats, 
subregions and motifs. 
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local fold symmetry created by the palindrome is 
thought to provide the site for DNA-binding proteins 
involved in the trascriptional machinery [71]. In more 
detail, palindromic motifs (and in general inverted 
repeats) have the potential to form single-stranded 
stem-loop cruciform structures which have been 
reported to be essential for replication of circular gen- 
omes in many prokaryotic and eukaryotic systems [72]. 
The redundancy of palindromic elements in the Muscu- 
lista male LUR, when compared to that of the female, 
may be possibly related to an increased duplication ratio 
of the M mtDNA; we can also speculate that this feature 
may have some role in the process by which sperm 
mitochondrial DNA becomes dominant or exclusive of 
the male germline, although we know that this is also 
achieved through a differential segregation during early 
embryo development, and likely through a second, more 
strict, selection during primordial germ cells establish- 
ment (see [73]). Nevertheless, the question of how 
sperm mitochondrial DNA becomes dominant or the 



exclusive component of the male germline in DUI spe- 
cies still remains open, and may be the outcome of var- 
ious coordinated processes. 

The duplication of the cox2 gene 

One noteworthy finding of this analysis is the cox2 gene 
duplication in the male mtDNA, with the duplicated 
gene being longer than the original one, a feature that 
might be somehow related to DUI. In fact, an interest- 
ing analogy is evident with unionid bivalves, in which 
the male cox2 gene show a 200-codon extension, which 
is absent in the female mtDNA. Such a feature is found 
in all analyzed unionids so far, and it has been related 
to DUI functioning [21,22,74-76]. Actually, [21,22] pro- 
posed several hypotheses for the role the cox2 extension 
may have for DUI, but all are dependent upon identify- 
ing a specific function for it, which is not a trivial task. 
Moreover, they detected in the male gonad a poly-ade- 
nylated mRNA transcript of the cox2 gene that includes 
the extension, and they concluded that the extension is 
protein-coding and functional. 

[21,22] also hypothesized that the COX2 protein 
extension might be involved in intracellular interactions 
determining the survival of the male mitochondrion. In 
other organisms, it has been shown that upon fertiliza- 
tion the sperm-derived mitochondria are targeted for 
elimination: a key process in sperm mitochondrial 
degradation is ubiquitination [77], in which mitochon- 
dria of paternal derivation are tagged with Ubiquitin 
and then degraded. In Mytilus, in which an Ubiquitin- 
like process has been proposed, this degradation would 
be sex-specific: the sperm-derived mitochondria survive 
in male embryos, whereas they are eliminated in 
females. All that considered, [21] proposed that the 
COX2 extension could be involved in blocking such 
elimination to ensure survival of the male mitochon- 
drion, or, alternatively, the extension could play a role 
in the segregation of male mitochondria to the gonad. 
In either case, it should be possible to detect the protein 
product of the extension outside of the inner 
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Figure 4 Large Unassigned Regions (LURs). Schematic structure of female (F) and male (M) LURs in Musculisto senhousio. 
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ah (dG=-0.77) Ch (dG=-1.49) a*h (dG=-1 .04) yih (dG=-2.64) y 2 / 3 h (dG=-10.41) y 4 h (dG=-10.41) 
Figure 5 LUR palindromes. Sequences and structures of palindromic motifs located in the Musculisto senhousio LURs. 



mitochondrial membrane. An in situ hybridization 
seemed to demonstrate that the unionid male COX2 is 
present on both inner and outer membranes of the 
sperm mitochondria (see Figure 4 in [74]). 

According to the above mentioned rationales, we 
hypothesize that the duplicated cox2b gene in male M. 
senhousia may represent a variant of what found in 
unionoidean bivalves, with proper signals for DUI mito- 
chondrial tagging lying in the COX2 protein extension 
of unionid bivalves, as well as in the duplicated COX2b 
protein of Musculista. A support to this view comes 
from the observation that an additional putative Trans 
Membrane Helix (TMH) is found in the 41 residue long 
tail of the Musculista COX2b, although this tail is con- 
siderably shorter that the unionid one (200 amminoa- 
cids). Actually, five putative TMHs were found in the 
unionid extended C-terminus of the male COX2, which 
led the Authors to hypothesize that it may have a func- 
tional significance for male unionoidean bivalve repro- 
ductive success [75,76]. 



In analogy, we suggest that COX2b might have some 
function related to mitochondrial tagging, like the 
COX2b and the Unionid COX2 extension. Further stu- 
dies are needed to gain a more clear role of such pro- 
teins in the unusual DUI system of mitochondrial 
inheritance. Actually, a duplication similar to the Mus- 
culista one was also found in V. philippinarum, but 
quite surprisingly in the female mtDNA (see unpub- 
lished GenBank annotation). This suggests that cox2 
duplication may be uncoupled with maleness. Moreover, 
no Mytilus genomes show a similar situation for cox2 or 
any other gene, so either duplicated genes or a cox2 tail 
may not be strictly necessary to sustain DUI. 

Conclusions 

The characteristics of the Musculista sex-linked 
mtDNAs evidently add to the knowledge of DUI sys- 
tems, and highlight some unexpected features, shared 
among distantly related DUI species. Since it is com- 
monly accepted that DUI is rather a variation of Strict 



Cgi FCox2 




Figure 6 Bayesian tree for the cox2 genes. Cgi: Crossostreo gigos; Med: Mytilus edulis; Mga: Mytilus galloprovincialis; Mtr: Mytilus trossulus; Mse: 
Musculisto senhousio. 
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Table 7 Serine tRNA [trnS(UCN) and trnS(AGN)] in bivalves. 



Taxonomy 



Species (GenBank Acc. No.) 



Missing 



UCN 



AGN 



UCN+AGN 



Pteriomorphia 

Mytiloida; Mytiloidea; Mytilidae 

Crenellinae 

Mytilinae 



Ostreoida; Ostreoidea; Ostreidae 



Pectinoida; Pectinoidea; Pectinidae 



Musculisto senhousio (GU001953) 
Mytilus edulis (AY823623) 
Mytilus galloprovinciolis (AY363687) 
Mytilus trossulus (DQ 198225) 

Saccostrea mordax (FJ841968) 
Crossostreo onguloto (FJ841965) 
Crassostreo ariokensis (FJ841964) 
Crassostrea gigas (NC_001276) 
Crassostreo hongkongensis (EU 266073) 
Crassostrea iredalei (FJ841 967) 
Crassostrea sikamea (FJ841966) 

Mizuhopecten yessoensis (FJ 5 95 95 9) 
Chlamys farreri (EU715252) 
Mimachlamys nobilis (FJ595958) 
Placopecten magellanicus (NC_007234)* 
Argopecten irradians (NC_009687) 
Argopecten irradians irradians (DQ665851) 



Heteroconchia 

Myoida; Hiatelloidea; Hiatellidae 
Veneroida; Cardioidea; Cardiidae 
Veneroida; Lucinoidea; Lucinidae 

Veneroida; Tellinoidea; Solecurtidae 
Veneroida; Veneroidea; Veneridae 



Hiatella arctica (NC_008451) 

Acanthocardia tuberculata (NC_008452) 

Loripes lacteus (EF043341) 
Lucinella divaricata (EF043342) 

Sinonovacula constricta (EU 880278) 

Meretrix meretrix (GQ463598) 
Meretrix petechialis (EU 145977) 
Venerupis philippinarum (AB065374) 
Paphia euglypta (GU269271) 



Palaeoheterodonta 

Unionoida; Unionoidea; Unionidae 

Ambleminae 
Anodontinae 
Anodontinae 
Unioninae 



Venustaconcha ellipsiformis (FJ809752) 
Quadrula quadrula (FJ809750) 
Cristaria plicata (FJ986302) 
Pyganodon grandis (FJ809754) 
Hyriopsis cumingii (FJ529186) 
Inversidens japanensis (AB055624) 
Unio pictorium (HM014131) 



*: Placopecten magellanicus has two copies of trnS(UCN) 

Note: only species with complete mitochondrial genomes available included. 
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Maternal Inheritance, than a completely different 
mechanism, we think that DUI is a good experimental 
model to better understand the general rules, as well as 
the molecular features of Metazoan mitochondrial 
inheritance (see [18], for a detailed discussion). For the 
above mentioned reasons, the complete mtDNA genome 
characterization of DUI bivalves is not only a mere 
descriptive exercise, but rather a first step to unravel the 
complex genetic signals allowing Doubly Uniparental 
Inheritance of mitochondrial DNA, and the evolutionary 
implications of such unusual transmission route in mito- 
chondrial genome evolution in Bivalvia. 

Methods 

Sample Collection 

Alive M. senhousia specimens from Venice Lagoon 
(Italy) were used for this analysis. Males and females 
were stimulated to spawn gametes in seawater supple- 
mented with hydrogen peroxide, according to [78]. Each 
emission was analyzed with a light microscope to sex 
specimens. A total of 10 sperm and 10 egg samples 
were then collected after a gentle centrifugation (3,000 
g). Seawater was removed, and ethanol added before 
storing samples at -20°C. 

PCR analyses 

Total genomic DNA was extracted using the DNeasy 
Tissue Kit (Qiagen), and partial sequences of cyto- 
chrome b {cob) and mitochondrial ribosomal large subu- 
nit RNA (rrnL) were amplified and directly sequenced 
(primers reported in Table 8), as described in [79]. 
Sequencing reactions were performed on both strands 
with BigDye Terminator Cycle Sequencing Kit according 
to supplier's instructions (Applied Biosystem) in a 310 
Genetic Analyzer (ABI) automatic sequencer. 

The 20 sequences obtained for both F and M gen- 
omes were aligned (not shown), and, after checking for 
variable sites, used to design sex-specific primers to 
amplify the entire mitochondrial genome in two over- 
lapping fragments by long PCR reactions. LongPCR was 
performed on one Musculista specimen per sex. To 
obtain the F genome, F-cob383R and F-16S142F 



Table 8 Primer sequences. 


Primer name 


Sequence 


cobR 1 


5'-GCRTAWGCRAAWARRAARTAYCAnCWGG-3' 


cobF 1 


5'-GGWAYGTWm/CCWGRGGWCARAT-3' 


16Sbr 2 


5'-CCGGTCTGAACTCAGATCACGT-3' 


16Sar 2 


5'-CGCCTGmATCAAAAACAT-3' 


F-cob383R 


5'-TAGGAG I I I I I ATAGGGTCTGC-3' 


F-16S142F 


5'-ACCTGAAGTOTCTCAmACC-3' 


M-cob386R 


5'-GGATAGGAG I I I I I ATAGGGTCTGC-3' 


M-16S103F 


5'-GTGAAmc™GAGTGACGA™-3' 



1 J.L. Boore, personal communication; 2 [88] 



primers were used. The M genome was amplified with 
M-cob386R and M-16S103F. Both pairs of primers 
amplified a fragment of 10-11 kb respectively. Long 
PCR primer sequences are reported in Table 1. 
LongPCR amplifications were performed on a Gene 
Amp® PCR System 2720 (Applied Biosystem) in 50 |il 
reaction volume composed of 31.5 (il of sterilized dis- 
tilled water, 10 ul of 5 x Herculase II Fusion Reaction 
Buffer, 0.5 |il of dNTPs mix (25 mM each dNTP), 1.25 
[il of each primer (10 (iM), 5 [il of DNA template (25- 
50 ng) and 0.5 (il of Herculase II Fusion DNA Polymer- 
ase. Reaction conditions were according to suppliers 
recommendations: initial denaturation at 95°C for 5 min 
and then incubated at 95°C for 20 s, 50°C for 20 s, and 
68°C for 10 min for 30 cycles and 68°C for 8 min for a 
final extension. Long-PCR fragments were then purified 
using Wizard® SV Gel and PCR Clean-Up System 
(Promega). 

Shotgun cloning 

Sequencing of the two LongPCR fragments was done 
using shotgun cloning: amplicons were randomly 
sheared to 1.2-1.5 kb DNA segments using a Hydro- 
Shear device (GeneMachines). Sheared DNA was blunt 
end repaired at room temperature for 60 min using 6 U 
of T4 DNA Polymerase (Roche), 30 U of DNA Polymer- 
ase I Klenow (NEB), 10 [il of dNTPs mix, 13 (il of 10 x 
NEB buffer 2 in a 115 (il total volume, and then gel pur- 
ified using the Wizard® SV Gel and PCR Clean-Up Sys- 
tem (Promega). The resulting fragments were ligated 
into the Smal site of a pUC18 cloning vector using the 
Fast-Link DNA ligation Kit (Epicentre) and electropo- 
rated into One Shot® TOP 10 Electrocomp™ Escherichia 
coli cells (Invitrogen) using standard protocols. Clones 
were screened by PCR using Ml 3 universal primers and 
recombinants were purified using Multiscreen (Milli- 
pore) according to the manufacturer's instructions. 
Clones were sequenced using M13 universal primers by 
Macrogen Inc. (Korea). 

Raw sequences were manually corrected, and then 
assembled into contigs with Sequencher v.4.6 (Gene 
Codes). Hence, the final assemblies were based on a 
minimum sequence coverage of 3x. 

Secondary structures and annotation 

The tRNA genes were identified by their secondary 
structure using ARWEN [65], with invertebrate mito- 
chondrial codon predictors. Analysis of Open Reading 
Frames (ORFs) was performed with the ORF Finder 
program of NCBI http://www.ncbi.nlm.nih.gov/projects/ 
gorf/ using the invertebrate mitochondrial genetic code. 
Sequences were identified using BLASTX, PSI-BLAST 
[80] and BLASTN [81] as implemented by the NCBI 
website http://www.ncbi.nlm.nih.gov/. 
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For all protein coding genes, alignments were com- 
puted with ClustalW [82]. 

When analyzing sequence variability, pairwise p-Dis- 
tances (pD), their mean values and standard errors (by 
the bootstrap procedure) were computed with MEGA 
v.5.03 [83]. In order to avoid any model of DNA substi- 
tution that can affect statistics (see [79]), the use of a 
pD was preferred. 

The divergence of protein genes in synonymous (Ks) 
and non-synonymous (Ka) sites was calculated by the 
modified Nei-Gojobori method with the Jukes-Cantor 
correction; the pD at the residue level was also calcu- 
lated within the MEGA v.5.03 environment [83]. 

Two-fold, and four-fold degenerated positions were 
identified using DnaSP v.5 [84]. The Sequence Manipu- 
lation Suite (http://www.bioinformatics.org/sms2; [85]) 
was used to estimate codon usage. Potential DNA sec- 
ondary structures near or at the 5'-end of protein genes 
were predicted using the UNAFold software package 
[86] available on the DINAMelt web server (http:// 
mfold.rna.albany.edu/?q=DINAMelt; [86]). 

Bayesian analyses on cox2 genes was performed with 
the MrBayes 3.1 (5,000,000 generations; [87]). 

Additional material 
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